AutoASC - A System for Automatic Acquisition of Sense Tagged Corpora

نویسندگان

  • Rada Mihalcea
  • Dan I. Moldovan
چکیده

Many natural language processing tasks, such as word sense disambiguation, knowledge acquisition, information retrieval, use semantically tagged corpora. Till recently, these corpus-based systems relied on text manually annotated with semantic tags; but the massive human intervention in this process has become a serious impediment in building robust systems. In this paper, we present AutoASC, a system which automatically acquires sense tagged corpora. It is based on (1) the information provided in WordNet, particularly the word definitions found within the glosses and (2) the information gathered from Internet using existing search engines. The system was tested on a set of 46 concepts, for which 2071 example sentences have been acquired; for these, a precision of 87% was observed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Acquisition of Sense Tagged Corpora

An important problem in Natural Language Processing is identifying thecorrect sense of a word in a particular context. Thus far, statistical methods have been considered the best techniques in word sense disambiguation. Unfortunately, these methods produce high accuracy results only for a small number of preselected words. The reduced applicability of statistical methods is due basically to the...

متن کامل

Towards Automatic Acquisition of a Fully Sense Tagged Corpus for Persian

Sense tagged corpora play a crucial role in Natural Language Processing, particularly in Word Sense Disambiguation and Natural Language Understanding. Since semantic annotations are usually performed by humans, such corpora are limited to a handful of tagged texts and are not available for many languages with scarce resources including Persian. The shortage of efficient, reliable linguistic res...

متن کامل

Automatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLS

Methods. For a term W that represents multiple UMLS concepts, a collection of MEDLINE abstracts that contain W is extracted. For each abstract in the collection, occurrences of concepts that have relations with W as defined in the UMLS are automatically identified. A sense-tagged corpus, in which senses of W are annotated, is then derived based on those identified concepts. The method was evalu...

متن کامل

Empirical Acquisition Of Differentiating Relations From Definitions

This paper describes a new automatic approach for extracting conceptual distinctions from dictionary definitions. A broad-coverage dependency parser is first used to extract the lexical relations from the definitions. Then the relations are disambiguated using associations learned from tagged corpora. This contrasts with earlier approaches using manually developed rules for disambiguation.

متن کامل

Unsupervised WSD based on Automatically Retrieved Examples: The Importance of Bias

This paper explores the large-scale acquisition of sense-tagged examples for Word Sense Disambiguation (WSD). We have applied the “WordNet monosemous relatives” method to construct automatically a web corpus that we have used to train disambiguation systems. The corpus-building process has highlighted important factors, such as the distribution of senses (bias). The corpus has been used to trai...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJPRAI

دوره 14  شماره 

صفحات  -

تاریخ انتشار 2000